- 
                Notifications
    You must be signed in to change notification settings 
- Fork 15k
[Clang] Support includes translated to module imports in -header-include-filtering=direct-per-file #156756
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Clang] Support includes translated to module imports in -header-include-filtering=direct-per-file #156756
Conversation
…ude-filtering=direct-per-file
| Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using  If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. | 
| @llvm/pr-subscribers-clang Author: Sina Mahdavi (sina-mahdavi) ChangesThis PR is a follow-up to #137087 that extends the output format generated by -header-include-filtering=direct-per-file to include information about the source location where those include/imports happended, as well as include information about the imported module when an include is translated to a module import. Full diff: https://github.com/llvm/llvm-project/pull/156756.diff 1 Files Affected: 
 diff --git a/clang/lib/Frontend/HeaderIncludeGen.cpp b/clang/lib/Frontend/HeaderIncludeGen.cpp
index 8ab335905f9f2..8de8d61b6262c 100644
--- a/clang/lib/Frontend/HeaderIncludeGen.cpp
+++ b/clang/lib/Frontend/HeaderIncludeGen.cpp
@@ -112,11 +112,17 @@ class HeaderIncludesJSONCallback : public PPCallbacks {
 /// an array of separate entries, one for each non-system source file used in
 /// the compilation showing only the direct includes and imports from that file.
 class HeaderIncludesDirectPerFileCallback : public PPCallbacks {
+  struct HeaderIncludeInfo {
+    SourceLocation location;
+    FileEntryRef file;
+    const Module *importedModule;
+  };
+
   SourceManager &SM;
   HeaderSearch &HSI;
   raw_ostream *OutputFile;
   bool OwnsOutputFile;
-  using DependencyMap = llvm::DenseMap<FileEntryRef, SmallVector<FileEntryRef>>;
+  using DependencyMap = llvm::DenseMap<FileEntryRef, SmallVector<HeaderIncludeInfo>>;
   DependencyMap Dependencies;
 
 public:
@@ -390,18 +396,43 @@ void HeaderIncludesDirectPerFileCallback::EndOfMainFile() {
   std::string Str;
   llvm::raw_string_ostream OS(Str);
   llvm::json::OStream JOS(OS);
-  JOS.array([&] {
-    for (auto S = SourceFiles.begin(), SE = SourceFiles.end(); S != SE; ++S) {
-      JOS.object([&] {
-        SmallVector<FileEntryRef> &Deps = Dependencies[*S];
-        JOS.attribute("source", S->getName().str());
-        JOS.attributeArray("includes", [&] {
-          for (unsigned I = 0, N = Deps.size(); I != N; ++I)
-            JOS.value(Deps[I].getName().str());
+  JOS.object([&] {
+    JOS.attribute("version", "2.0.0");
+    JOS.attributeArray("dependencies", [&] {
+      for (auto S = SourceFiles.begin(), SE = SourceFiles.end(); S != SE; ++S) {
+        JOS.object([&] {
+          SmallVector<HeaderIncludeInfo> &Deps = Dependencies[*S];
+          JOS.attribute("source", S->getName().str());
+          JOS.attributeArray("includes", [&] {
+            for (unsigned I = 0, N = Deps.size(); I != N; ++I) {
+              if (!Deps[I].importedModule) {
+                JOS.object([&] {
+                  PresumedLoc PLoc = SM.getPresumedLoc(Deps[I].location);
+                  std::string locationStr = PLoc.isInvalid() ? "<invalid>" : std::to_string(PLoc.getLine()) + ":" + std::to_string(PLoc.getColumn());
+                  JOS.attribute("location", locationStr);
+                  JOS.attribute("file", Deps[I].file.getName());
+                });
+              }
+            }
+          });
+          JOS.attributeArray("imports", [&] {
+            for (unsigned I = 0, N = Deps.size(); I != N; ++I) {
+              if (Deps[I].importedModule) {
+                JOS.object([&] {
+                  PresumedLoc PLoc = SM.getPresumedLoc(Deps[I].location);
+                  std::string locationStr = PLoc.isInvalid() ? "<invalid>" : std::to_string(PLoc.getLine()) + ":" + std::to_string(PLoc.getColumn());
+                  JOS.attribute("location", locationStr);
+                  JOS.attribute("module", Deps[I].importedModule->getTopLevelModuleName());
+                  JOS.attribute("file", Deps[I].file.getName());
+                });
+              }
+            }
+          });
         });
-      });
-    }
+      }
+    });
   });
+  
   OS << "\n";
 
   if (OutputFile->get_kind() == raw_ostream::OStreamKind::OK_FDStream) {
@@ -427,7 +458,19 @@ void HeaderIncludesDirectPerFileCallback::InclusionDirective(
   if (!FromFile)
     return;
 
-  Dependencies[*FromFile].push_back(*File);
+  FileEntryRef headerOrModule = *File;
+  if (ModuleImported && SuggestedModule) {
+    OptionalFileEntryRef ModuleMapFile = HSI.getModuleMap().getModuleMapFileForUniquing(SuggestedModule);
+    if (ModuleMapFile) {
+      headerOrModule = *ModuleMapFile;
+    }
+  }
+
+  Dependencies[*FromFile].push_back({
+    .location = Loc,
+    .file = headerOrModule,
+    .importedModule = (ModuleImported ? SuggestedModule : nullptr)
+  });
 }
 
 void HeaderIncludesDirectPerFileCallback::moduleImport(SourceLocation ImportLoc,
@@ -448,5 +491,9 @@ void HeaderIncludesDirectPerFileCallback::moduleImport(SourceLocation ImportLoc,
   if (!ModuleMapFile)
     return;
 
-  Dependencies[*FromFile].push_back(*ModuleMapFile);
+  Dependencies[*FromFile].push_back({
+    .location = Loc,
+    .file = *ModuleMapFile,
+    .importedModule = Imported
+  });
 }
 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have reviewed these changes and they look OK to me. This is a new option and as far as I know, swift-build is the only thing using it. Sina has a change to update swift-build to work with the new format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add some tests for the new behavior.
| I think I've resolved all of the comments except adding tests in my new commit. I just need to write some tests. | 
| I changed the old test to expect the new output format, and added a new test too for modules. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks reasonable to me, but I'd like to make sure Jan is happy with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Co-authored-by: Jan Svoboda <jan@svoboda.ai>
| ✅ With the latest revision this PR passed the C/C++ code formatter. | 
d498d91    to
    abf1163      
    Compare
  
    | @sina-mahdavi Congratulations on having your first Pull Request (PR) merged into the LLVM Project! Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR. Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues. How to do this, and the rest of the post-merge process, is covered in detail here. If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again. If you don't get any reports, no action is required from you. Your changes are working as expected, well done! | 
…ude-filtering=direct-per-file (llvm#156756) This PR is a follow-up to llvm#137087 that extends the output format generated by -header-include-filtering=direct-per-file to include information about the source location where those include/imports happended, as well as include information about the imported module when an include is translated to a module import. rdar://161359514 (cherry-picked from commit 8df194f)
[Clang] Support includes translated to module imports in -header-include-filtering=direct-per-file (llvm#156756)
…ude-filtering=direct-per-file (llvm#156756) This PR is a follow-up to llvm#137087 that extends the output format generated by -header-include-filtering=direct-per-file to include information about the source location where those include/imports happended, as well as include information about the imported module when an include is translated to a module import.
| #include "header0.h" | ||
| #include "system2.h" | ||
|  | ||
| // RUN: rm %t.txt | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is brittle and in incremental build + test using the cached PCMs in %t after a binary change to the PCMs will result in memory corruption and random CI failures. I will send a fix for this.
cc @thurstond
A recent change adding a new sanitizer kind (via Sanitizer.defs) was reverted in c74fa20 ("Revert "[Clang][CodeGen] Introduce the AllocToken SanitizerKind" (llvm#162413)"). The reason was this ASan report, when running the test in clang/test/Preprocessor/print-header-json.c: ``` ==clang==483265==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7d82b97e8b58 at pc 0x562cd432231f bp 0x7fff3fad0850 sp 0x7fff3fad0848 READ of size 16 at 0x7d82b97e8b58 thread T0 #0 0x562cd432231e in __copy_non_overlapping_range<const unsigned long *, const unsigned long *> zorg-test/libcxx_install_asan_ubsan/include/c++/v1/string:2144:38 llvm#1 0x562cd432231e in void std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>::__init_with_size[abi:nn220000]<unsigned long const*, unsigned long const*>(unsigned long const*, unsigned long const*, unsigned long) zorg-test/libcxx_install_asan_ubsan/include/c++/v1/string:2685:18 llvm#2 0x562cd41e2797 in __init<const unsigned long *, 0> zorg-test/libcxx_install_asan_ubsan/include/c++/v1/string:2673:3 llvm#3 0x562cd41e2797 in basic_string<const unsigned long *, 0> zorg-test/libcxx_install_asan_ubsan/include/c++/v1/string:1174:5 llvm#4 0x562cd41e2797 in clang::ASTReader::ReadString(llvm::SmallVectorImpl<unsigned long> const&, unsigned int&) clang/lib/Serialization/ASTReader.cpp:10171:15 llvm#5 0x562cd41fd89a in clang::ASTReader::ParseLanguageOptions(llvm::SmallVector<unsigned long, 64u> const&, llvm::StringRef, bool, clang::ASTReaderListener&, bool) clang/lib/Serialization/ASTReader.cpp:6475:28 llvm#6 0x562cd41eea53 in clang::ASTReader::ReadOptionsBlock(llvm::BitstreamCursor&, llvm::StringRef, unsigned int, bool, clang::ASTReaderListener&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&) clang/lib/Serialization/ASTReader.cpp:3069:11 llvm#7 0x562cd4204ab8 in clang::ASTReader::ReadControlBlock(clang::serialization::ModuleFile&, llvm::SmallVectorImpl<clang::ASTReader::ImportedModule>&, clang::serialization::ModuleFile const*, unsigned int) clang/lib/Serialization/ASTReader.cpp:3249:15 llvm#8 0x562cd42097d2 in clang::ASTReader::ReadASTCore(llvm::StringRef, clang::serialization::ModuleKind, clang::SourceLocation, clang::serialization::ModuleFile*, llvm::SmallVectorImpl<clang::ASTReader::ImportedModule>&, long, long, clang::ASTFileSignature, unsigned int) clang/lib/Serialization/ASTReader.cpp:5182:15 llvm#9 0x562cd421ec77 in clang::ASTReader::ReadAST(llvm::StringRef, clang::serialization::ModuleKind, clang::SourceLocation, unsigned int, clang::serialization::ModuleFile**) clang/lib/Serialization/ASTReader.cpp:4828:11 llvm#10 0x562cd3d07b74 in clang::CompilerInstance::findOrCompileModuleAndReadAST(llvm::StringRef, clang::SourceLocation, clang::SourceLocation, bool) clang/lib/Frontend/CompilerInstance.cpp:1805:27 llvm#11 0x562cd3d0b2ef in clang::CompilerInstance::loadModule(clang::SourceLocation, llvm::ArrayRef<clang::IdentifierLoc>, clang::Module::NameVisibilityKind, bool) clang/lib/Frontend/CompilerInstance.cpp:1956:31 llvm#12 0x562cdb04eb1c in clang::Preprocessor::HandleHeaderIncludeOrImport(clang::SourceLocation, clang::Token&, clang::Token&, clang::SourceLocation, clang::detail::SearchDirIteratorImpl<true>, clang::FileEntry const*) clang/lib/Lex/PPDirectives.cpp:2423:49 llvm#13 0x562cdb042222 in clang::Preprocessor::HandleIncludeDirective(clang::SourceLocation, clang::Token&, clang::detail::SearchDirIteratorImpl<true>, clang::FileEntry const*) clang/lib/Lex/PPDirectives.cpp:2101:17 llvm#14 0x562cdb043366 in clang::Preprocessor::HandleDirective(clang::Token&) clang/lib/Lex/PPDirectives.cpp:1338:14 llvm#15 0x562cdafa84bc in clang::Lexer::LexTokenInternal(clang::Token&, bool) clang/lib/Lex/Lexer.cpp:4512:7 llvm#16 0x562cdaf9f20b in clang::Lexer::Lex(clang::Token&) clang/lib/Lex/Lexer.cpp:3729:24 llvm#17 0x562cdb0d4ffa in clang::Preprocessor::Lex(clang::Token&) clang/lib/Lex/Preprocessor.cpp:896:11 llvm#18 0x562cd77da950 in clang::ParseAST(clang::Sema&, bool, bool) clang/lib/Parse/ParseAST.cpp:163:7 [...] 0x7d82b97e8b58 is located 0 bytes after 3288-byte region [0x7d82b97e7e80,0x7d82b97e8b58) allocated by thread T0 here: #0 0x562cca76f604 in malloc zorg-test/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:67:3 llvm#1 0x562cd1cce452 in safe_malloc llvm/include/llvm/Support/MemAlloc.h:26:18 llvm#2 0x562cd1cce452 in llvm::SmallVectorBase<unsigned int>::grow_pod(void*, unsigned long, unsigned long) llvm/lib/Support/SmallVector.cpp:151:15 llvm#3 0x562cdbe1768b in grow_pod llvm/include/llvm/ADT/SmallVector.h:139:11 llvm#4 0x562cdbe1768b in grow llvm/include/llvm/ADT/SmallVector.h:525:41 llvm#5 0x562cdbe1768b in reserve llvm/include/llvm/ADT/SmallVector.h:665:13 llvm#6 0x562cdbe1768b in llvm::BitstreamCursor::readRecord(unsigned int, llvm::SmallVectorImpl<unsigned long>&, llvm::StringRef*) llvm/lib/Bitstream/Reader/BitstreamReader.cpp:230:10 llvm#7 0x562cd41ee8ab in clang::ASTReader::ReadOptionsBlock(llvm::BitstreamCursor&, llvm::StringRef, unsigned int, bool, clang::ASTReaderListener&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&) clang/lib/Serialization/ASTReader.cpp:3060:49 llvm#8 0x562cd4204ab8 in clang::ASTReader::ReadControlBlock(clang::serialization::ModuleFile&, llvm::SmallVectorImpl<clang::ASTReader::ImportedModule>&, clang::serialization::ModuleFile const*, unsigned int) clang/lib/Serialization/ASTReader.cpp:3249:15 llvm#9 0x562cd42097d2 in clang::ASTReader::ReadASTCore(llvm::StringRef, clang::serialization::ModuleKind, clang::SourceLocation, clang::serialization::ModuleFile*, llvm::SmallVectorImpl<clang::ASTReader::ImportedModule>&, long, long, clang::ASTFileSignature, unsigned int) clang/lib/Serialization/ASTReader.cpp:5182:15 llvm#10 0x562cd421ec77 in clang::ASTReader::ReadAST(llvm::StringRef, clang::serialization::ModuleKind, clang::SourceLocation, unsigned int, clang::serialization::ModuleFile**) clang/lib/Serialization/ASTReader.cpp:4828:11 llvm#11 0x562cd3d07b74 in clang::CompilerInstance::findOrCompileModuleAndReadAST(llvm::StringRef, clang::SourceLocation, clang::SourceLocation, bool) clang/lib/Frontend/CompilerInstance.cpp:1805:27 llvm#12 0x562cd3d0b2ef in clang::CompilerInstance::loadModule(clang::SourceLocation, llvm::ArrayRef<clang::IdentifierLoc>, clang::Module::NameVisibilityKind, bool) clang/lib/Frontend/CompilerInstance.cpp:1956:31 llvm#13 0x562cdb04eb1c in clang::Preprocessor::HandleHeaderIncludeOrImport(clang::SourceLocation, clang::Token&, clang::Token&, clang::SourceLocation, clang::detail::SearchDirIteratorImpl<true>, clang::FileEntry const*) clang/lib/Lex/PPDirectives.cpp:2423:49 llvm#14 0x562cdb042222 in clang::Preprocessor::HandleIncludeDirective(clang::SourceLocation, clang::Token&, clang::detail::SearchDirIteratorImpl<true>, clang::FileEntry const*) clang/lib/Lex/PPDirectives.cpp:2101:17 llvm#15 0x562cdb043366 in clang::Preprocessor::HandleDirective(clang::Token&) clang/lib/Lex/PPDirectives.cpp:1338:14 llvm#16 0x562cdafa84bc in clang::Lexer::LexTokenInternal(clang::Token&, bool) clang/lib/Lex/Lexer.cpp:4512:7 llvm#17 0x562cdaf9f20b in clang::Lexer::Lex(clang::Token&) clang/lib/Lex/Lexer.cpp:3729:24 llvm#18 0x562cdb0d4ffa in clang::Preprocessor::Lex(clang::Token&) clang/lib/Lex/Preprocessor.cpp:896:11 llvm#19 0x562cd77da950 in clang::ParseAST(clang::Sema&, bool, bool) clang/lib/Parse/ParseAST.cpp:163:7 [...] SUMMARY: AddressSanitizer: heap-buffer-overflow clang/lib/Serialization/ASTReader.cpp:10171:15 in clang::ASTReader::ReadString(llvm::SmallVectorImpl<unsigned long> const&, unsigned int&) ``` The reason is this particular RUN line: ``` // RUN: env CC_PRINT_HEADERS_FORMAT=json CC_PRINT_HEADERS_FILTERING=direct-per-file CC_PRINT_HEADERS_FILE=%t.txt %clang -fsyntax-only -I %S/Inputs/print-header-json -isystem %S/Inputs/print-header-json/system -fmodules -fimplicit-module-maps -fmodules-cache-path=%t %s -o /dev/null ``` which was added in 8df194f ("[Clang] Support includes translated to module imports in -header-include-filtering=direct-per-file (llvm#156756)"). The problem is caused by an incremental build reusing stale cached module files (.pcm) that are no longer binary-compatible with the updated compiler. Adding a new sanitizer option altered the implicit binary layout of the serialized LangOptions data structure. The build + test system is oblivious to such changes. When the new compiler attempted to read the old module file (from the previous test invocation), it misinterpreted the data due to the layout mismatch, resulting in a heap-buffer-overflow. Unfortunately Clang's PCM format does not encode nor detect version mismatches here; a more graceful failure mode would be preferable. For now, fix the test to be more robust with incremental build + test.
A recent change adding a new sanitizer kind (via Sanitizers.def) was reverted in c74fa20 ("Revert "[Clang][CodeGen] Introduce the AllocToken SanitizerKind" (#162413)"). The reason was this ASan report, when running the test cases in clang/test/Preprocessor/print-header-json.c: ``` ==clang==483265==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7d82b97e8b58 at pc 0x562cd432231f bp 0x7fff3fad0850 sp 0x7fff3fad0848 READ of size 16 at 0x7d82b97e8b58 thread T0 #0 0x562cd432231e in __copy_non_overlapping_range<const unsigned long *, const unsigned long *> zorg-test/libcxx_install_asan_ubsan/include/c++/v1/string:2144:38 #1 0x562cd432231e in void std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>::__init_with_size[abi:nn220000]<unsigned long const*, unsigned long const*>(unsigned long const*, unsigned long const*, unsigned long) zorg-test/libcxx_install_asan_ubsan/include/c++/v1/string:2685:18 #2 0x562cd41e2797 in __init<const unsigned long *, 0> zorg-test/libcxx_install_asan_ubsan/include/c++/v1/string:2673:3 #3 0x562cd41e2797 in basic_string<const unsigned long *, 0> zorg-test/libcxx_install_asan_ubsan/include/c++/v1/string:1174:5 #4 0x562cd41e2797 in clang::ASTReader::ReadString(llvm::SmallVectorImpl<unsigned long> const&, unsigned int&) clang/lib/Serialization/ASTReader.cpp:10171:15 #5 0x562cd41fd89a in clang::ASTReader::ParseLanguageOptions(llvm::SmallVector<unsigned long, 64u> const&, llvm::StringRef, bool, clang::ASTReaderListener&, bool) clang/lib/Serialization/ASTReader.cpp:6475:28 #6 0x562cd41eea53 in clang::ASTReader::ReadOptionsBlock(llvm::BitstreamCursor&, llvm::StringRef, unsigned int, bool, clang::ASTReaderListener&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&) clang/lib/Serialization/ASTReader.cpp:3069:11 #7 0x562cd4204ab8 in clang::ASTReader::ReadControlBlock(clang::serialization::ModuleFile&, llvm::SmallVectorImpl<clang::ASTReader::ImportedModule>&, clang::serialization::ModuleFile const*, unsigned int) clang/lib/Serialization/ASTReader.cpp:3249:15 #8 0x562cd42097d2 in clang::ASTReader::ReadASTCore(llvm::StringRef, clang::serialization::ModuleKind, clang::SourceLocation, clang::serialization::ModuleFile*, llvm::SmallVectorImpl<clang::ASTReader::ImportedModule>&, long, long, clang::ASTFileSignature, unsigned int) clang/lib/Serialization/ASTReader.cpp:5182:15 #9 0x562cd421ec77 in clang::ASTReader::ReadAST(llvm::StringRef, clang::serialization::ModuleKind, clang::SourceLocation, unsigned int, clang::serialization::ModuleFile**) clang/lib/Serialization/ASTReader.cpp:4828:11 #10 0x562cd3d07b74 in clang::CompilerInstance::findOrCompileModuleAndReadAST(llvm::StringRef, clang::SourceLocation, clang::SourceLocation, bool) clang/lib/Frontend/CompilerInstance.cpp:1805:27 #11 0x562cd3d0b2ef in clang::CompilerInstance::loadModule(clang::SourceLocation, llvm::ArrayRef<clang::IdentifierLoc>, clang::Module::NameVisibilityKind, bool) clang/lib/Frontend/CompilerInstance.cpp:1956:31 #12 0x562cdb04eb1c in clang::Preprocessor::HandleHeaderIncludeOrImport(clang::SourceLocation, clang::Token&, clang::Token&, clang::SourceLocation, clang::detail::SearchDirIteratorImpl<true>, clang::FileEntry const*) clang/lib/Lex/PPDirectives.cpp:2423:49 #13 0x562cdb042222 in clang::Preprocessor::HandleIncludeDirective(clang::SourceLocation, clang::Token&, clang::detail::SearchDirIteratorImpl<true>, clang::FileEntry const*) clang/lib/Lex/PPDirectives.cpp:2101:17 #14 0x562cdb043366 in clang::Preprocessor::HandleDirective(clang::Token&) clang/lib/Lex/PPDirectives.cpp:1338:14 #15 0x562cdafa84bc in clang::Lexer::LexTokenInternal(clang::Token&, bool) clang/lib/Lex/Lexer.cpp:4512:7 #16 0x562cdaf9f20b in clang::Lexer::Lex(clang::Token&) clang/lib/Lex/Lexer.cpp:3729:24 #17 0x562cdb0d4ffa in clang::Preprocessor::Lex(clang::Token&) clang/lib/Lex/Preprocessor.cpp:896:11 #18 0x562cd77da950 in clang::ParseAST(clang::Sema&, bool, bool) clang/lib/Parse/ParseAST.cpp:163:7 [...] 0x7d82b97e8b58 is located 0 bytes after 3288-byte region [0x7d82b97e7e80,0x7d82b97e8b58) allocated by thread T0 here: #0 0x562cca76f604 in malloc zorg-test/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:67:3 #1 0x562cd1cce452 in safe_malloc llvm/include/llvm/Support/MemAlloc.h:26:18 #2 0x562cd1cce452 in llvm::SmallVectorBase<unsigned int>::grow_pod(void*, unsigned long, unsigned long) llvm/lib/Support/SmallVector.cpp:151:15 #3 0x562cdbe1768b in grow_pod llvm/include/llvm/ADT/SmallVector.h:139:11 #4 0x562cdbe1768b in grow llvm/include/llvm/ADT/SmallVector.h:525:41 #5 0x562cdbe1768b in reserve llvm/include/llvm/ADT/SmallVector.h:665:13 #6 0x562cdbe1768b in llvm::BitstreamCursor::readRecord(unsigned int, llvm::SmallVectorImpl<unsigned long>&, llvm::StringRef*) llvm/lib/Bitstream/Reader/BitstreamReader.cpp:230:10 #7 0x562cd41ee8ab in clang::ASTReader::ReadOptionsBlock(llvm::BitstreamCursor&, llvm::StringRef, unsigned int, bool, clang::ASTReaderListener&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&) clang/lib/Serialization/ASTReader.cpp:3060:49 #8 0x562cd4204ab8 in clang::ASTReader::ReadControlBlock(clang::serialization::ModuleFile&, llvm::SmallVectorImpl<clang::ASTReader::ImportedModule>&, clang::serialization::ModuleFile const*, unsigned int) clang/lib/Serialization/ASTReader.cpp:3249:15 #9 0x562cd42097d2 in clang::ASTReader::ReadASTCore(llvm::StringRef, clang::serialization::ModuleKind, clang::SourceLocation, clang::serialization::ModuleFile*, llvm::SmallVectorImpl<clang::ASTReader::ImportedModule>&, long, long, clang::ASTFileSignature, unsigned int) clang/lib/Serialization/ASTReader.cpp:5182:15 #10 0x562cd421ec77 in clang::ASTReader::ReadAST(llvm::StringRef, clang::serialization::ModuleKind, clang::SourceLocation, unsigned int, clang::serialization::ModuleFile**) clang/lib/Serialization/ASTReader.cpp:4828:11 #11 0x562cd3d07b74 in clang::CompilerInstance::findOrCompileModuleAndReadAST(llvm::StringRef, clang::SourceLocation, clang::SourceLocation, bool) clang/lib/Frontend/CompilerInstance.cpp:1805:27 #12 0x562cd3d0b2ef in clang::CompilerInstance::loadModule(clang::SourceLocation, llvm::ArrayRef<clang::IdentifierLoc>, clang::Module::NameVisibilityKind, bool) clang/lib/Frontend/CompilerInstance.cpp:1956:31 #13 0x562cdb04eb1c in clang::Preprocessor::HandleHeaderIncludeOrImport(clang::SourceLocation, clang::Token&, clang::Token&, clang::SourceLocation, clang::detail::SearchDirIteratorImpl<true>, clang::FileEntry const*) clang/lib/Lex/PPDirectives.cpp:2423:49 #14 0x562cdb042222 in clang::Preprocessor::HandleIncludeDirective(clang::SourceLocation, clang::Token&, clang::detail::SearchDirIteratorImpl<true>, clang::FileEntry const*) clang/lib/Lex/PPDirectives.cpp:2101:17 #15 0x562cdb043366 in clang::Preprocessor::HandleDirective(clang::Token&) clang/lib/Lex/PPDirectives.cpp:1338:14 #16 0x562cdafa84bc in clang::Lexer::LexTokenInternal(clang::Token&, bool) clang/lib/Lex/Lexer.cpp:4512:7 #17 0x562cdaf9f20b in clang::Lexer::Lex(clang::Token&) clang/lib/Lex/Lexer.cpp:3729:24 #18 0x562cdb0d4ffa in clang::Preprocessor::Lex(clang::Token&) clang/lib/Lex/Preprocessor.cpp:896:11 #19 0x562cd77da950 in clang::ParseAST(clang::Sema&, bool, bool) clang/lib/Parse/ParseAST.cpp:163:7 [...] SUMMARY: AddressSanitizer: heap-buffer-overflow clang/lib/Serialization/ASTReader.cpp:10171:15 in clang::ASTReader::ReadString(llvm::SmallVectorImpl<unsigned long> const&, unsigned int&) ``` The reason is this particular RUN line: ``` // RUN: env CC_PRINT_HEADERS_FORMAT=json CC_PRINT_HEADERS_FILTERING=direct-per-file CC_PRINT_HEADERS_FILE=%t.txt %clang -fsyntax-only -I %S/Inputs/print-header-json -isystem %S/Inputs/print-header-json/system -fmodules -fimplicit-module-maps -fmodules-cache-path=%t %s -o /dev/null ``` which was added in 8df194f ("[Clang] Support includes translated to module imports in -header-include-filtering=direct-per-file (#156756)"). The problem is caused by an incremental build reusing stale cached module files (.pcm) that are no longer binary-compatible with the updated compiler. Adding a new sanitizer option altered the implicit binary layout of the serialized LangOptions data structure. The build + test system is oblivious to such changes. When the new compiler attempted to read the old module file (from the previous test invocation), it misinterpreted the data due to the layout mismatch, resulting in a heap-buffer-overflow. Unfortunately Clang's PCM format does not encode nor detect version mismatches here; a more graceful failure mode would be preferable. For now, fix the test to be more robust with incremental build + test.
A recent change adding a new sanitizer kind (via Sanitizers.def) was reverted in c74fa20 ("Revert "[Clang][CodeGen] Introduce the AllocToken SanitizerKind" (#162413)"). The reason was this ASan report, when running the test cases in clang/test/Preprocessor/print-header-json.c: ``` ==clang==483265==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7d82b97e8b58 at pc 0x562cd432231f bp 0x7fff3fad0850 sp 0x7fff3fad0848 READ of size 16 at 0x7d82b97e8b58 thread T0 #0 0x562cd432231e in __copy_non_overlapping_range<const unsigned long *, const unsigned long *> zorg-test/libcxx_install_asan_ubsan/include/c++/v1/string:2144:38 #1 0x562cd432231e in void std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>::__init_with_size[abi:nn220000]<unsigned long const*, unsigned long const*>(unsigned long const*, unsigned long const*, unsigned long) zorg-test/libcxx_install_asan_ubsan/include/c++/v1/string:2685:18 #2 0x562cd41e2797 in __init<const unsigned long *, 0> zorg-test/libcxx_install_asan_ubsan/include/c++/v1/string:2673:3 #3 0x562cd41e2797 in basic_string<const unsigned long *, 0> zorg-test/libcxx_install_asan_ubsan/include/c++/v1/string:1174:5 #4 0x562cd41e2797 in clang::ASTReader::ReadString(llvm::SmallVectorImpl<unsigned long> const&, unsigned int&) clang/lib/Serialization/ASTReader.cpp:10171:15 #5 0x562cd41fd89a in clang::ASTReader::ParseLanguageOptions(llvm::SmallVector<unsigned long, 64u> const&, llvm::StringRef, bool, clang::ASTReaderListener&, bool) clang/lib/Serialization/ASTReader.cpp:6475:28 #6 0x562cd41eea53 in clang::ASTReader::ReadOptionsBlock(llvm::BitstreamCursor&, llvm::StringRef, unsigned int, bool, clang::ASTReaderListener&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&) clang/lib/Serialization/ASTReader.cpp:3069:11 #7 0x562cd4204ab8 in clang::ASTReader::ReadControlBlock(clang::serialization::ModuleFile&, llvm::SmallVectorImpl<clang::ASTReader::ImportedModule>&, clang::serialization::ModuleFile const*, unsigned int) clang/lib/Serialization/ASTReader.cpp:3249:15 #8 0x562cd42097d2 in clang::ASTReader::ReadASTCore(llvm::StringRef, clang::serialization::ModuleKind, clang::SourceLocation, clang::serialization::ModuleFile*, llvm::SmallVectorImpl<clang::ASTReader::ImportedModule>&, long, long, clang::ASTFileSignature, unsigned int) clang/lib/Serialization/ASTReader.cpp:5182:15 #9 0x562cd421ec77 in clang::ASTReader::ReadAST(llvm::StringRef, clang::serialization::ModuleKind, clang::SourceLocation, unsigned int, clang::serialization::ModuleFile**) clang/lib/Serialization/ASTReader.cpp:4828:11 #10 0x562cd3d07b74 in clang::CompilerInstance::findOrCompileModuleAndReadAST(llvm::StringRef, clang::SourceLocation, clang::SourceLocation, bool) clang/lib/Frontend/CompilerInstance.cpp:1805:27 #11 0x562cd3d0b2ef in clang::CompilerInstance::loadModule(clang::SourceLocation, llvm::ArrayRef<clang::IdentifierLoc>, clang::Module::NameVisibilityKind, bool) clang/lib/Frontend/CompilerInstance.cpp:1956:31 #12 0x562cdb04eb1c in clang::Preprocessor::HandleHeaderIncludeOrImport(clang::SourceLocation, clang::Token&, clang::Token&, clang::SourceLocation, clang::detail::SearchDirIteratorImpl<true>, clang::FileEntry const*) clang/lib/Lex/PPDirectives.cpp:2423:49 #13 0x562cdb042222 in clang::Preprocessor::HandleIncludeDirective(clang::SourceLocation, clang::Token&, clang::detail::SearchDirIteratorImpl<true>, clang::FileEntry const*) clang/lib/Lex/PPDirectives.cpp:2101:17 #14 0x562cdb043366 in clang::Preprocessor::HandleDirective(clang::Token&) clang/lib/Lex/PPDirectives.cpp:1338:14 #15 0x562cdafa84bc in clang::Lexer::LexTokenInternal(clang::Token&, bool) clang/lib/Lex/Lexer.cpp:4512:7 #16 0x562cdaf9f20b in clang::Lexer::Lex(clang::Token&) clang/lib/Lex/Lexer.cpp:3729:24 #17 0x562cdb0d4ffa in clang::Preprocessor::Lex(clang::Token&) clang/lib/Lex/Preprocessor.cpp:896:11 #18 0x562cd77da950 in clang::ParseAST(clang::Sema&, bool, bool) clang/lib/Parse/ParseAST.cpp:163:7 [...] 0x7d82b97e8b58 is located 0 bytes after 3288-byte region [0x7d82b97e7e80,0x7d82b97e8b58) allocated by thread T0 here: #0 0x562cca76f604 in malloc zorg-test/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:67:3 #1 0x562cd1cce452 in safe_malloc llvm/include/llvm/Support/MemAlloc.h:26:18 #2 0x562cd1cce452 in llvm::SmallVectorBase<unsigned int>::grow_pod(void*, unsigned long, unsigned long) llvm/lib/Support/SmallVector.cpp:151:15 #3 0x562cdbe1768b in grow_pod llvm/include/llvm/ADT/SmallVector.h:139:11 #4 0x562cdbe1768b in grow llvm/include/llvm/ADT/SmallVector.h:525:41 #5 0x562cdbe1768b in reserve llvm/include/llvm/ADT/SmallVector.h:665:13 #6 0x562cdbe1768b in llvm::BitstreamCursor::readRecord(unsigned int, llvm::SmallVectorImpl<unsigned long>&, llvm::StringRef*) llvm/lib/Bitstream/Reader/BitstreamReader.cpp:230:10 #7 0x562cd41ee8ab in clang::ASTReader::ReadOptionsBlock(llvm::BitstreamCursor&, llvm::StringRef, unsigned int, bool, clang::ASTReaderListener&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&) clang/lib/Serialization/ASTReader.cpp:3060:49 #8 0x562cd4204ab8 in clang::ASTReader::ReadControlBlock(clang::serialization::ModuleFile&, llvm::SmallVectorImpl<clang::ASTReader::ImportedModule>&, clang::serialization::ModuleFile const*, unsigned int) clang/lib/Serialization/ASTReader.cpp:3249:15 #9 0x562cd42097d2 in clang::ASTReader::ReadASTCore(llvm::StringRef, clang::serialization::ModuleKind, clang::SourceLocation, clang::serialization::ModuleFile*, llvm::SmallVectorImpl<clang::ASTReader::ImportedModule>&, long, long, clang::ASTFileSignature, unsigned int) clang/lib/Serialization/ASTReader.cpp:5182:15 #10 0x562cd421ec77 in clang::ASTReader::ReadAST(llvm::StringRef, clang::serialization::ModuleKind, clang::SourceLocation, unsigned int, clang::serialization::ModuleFile**) clang/lib/Serialization/ASTReader.cpp:4828:11 #11 0x562cd3d07b74 in clang::CompilerInstance::findOrCompileModuleAndReadAST(llvm::StringRef, clang::SourceLocation, clang::SourceLocation, bool) clang/lib/Frontend/CompilerInstance.cpp:1805:27 #12 0x562cd3d0b2ef in clang::CompilerInstance::loadModule(clang::SourceLocation, llvm::ArrayRef<clang::IdentifierLoc>, clang::Module::NameVisibilityKind, bool) clang/lib/Frontend/CompilerInstance.cpp:1956:31 #13 0x562cdb04eb1c in clang::Preprocessor::HandleHeaderIncludeOrImport(clang::SourceLocation, clang::Token&, clang::Token&, clang::SourceLocation, clang::detail::SearchDirIteratorImpl<true>, clang::FileEntry const*) clang/lib/Lex/PPDirectives.cpp:2423:49 #14 0x562cdb042222 in clang::Preprocessor::HandleIncludeDirective(clang::SourceLocation, clang::Token&, clang::detail::SearchDirIteratorImpl<true>, clang::FileEntry const*) clang/lib/Lex/PPDirectives.cpp:2101:17 #15 0x562cdb043366 in clang::Preprocessor::HandleDirective(clang::Token&) clang/lib/Lex/PPDirectives.cpp:1338:14 #16 0x562cdafa84bc in clang::Lexer::LexTokenInternal(clang::Token&, bool) clang/lib/Lex/Lexer.cpp:4512:7 #17 0x562cdaf9f20b in clang::Lexer::Lex(clang::Token&) clang/lib/Lex/Lexer.cpp:3729:24 #18 0x562cdb0d4ffa in clang::Preprocessor::Lex(clang::Token&) clang/lib/Lex/Preprocessor.cpp:896:11 #19 0x562cd77da950 in clang::ParseAST(clang::Sema&, bool, bool) clang/lib/Parse/ParseAST.cpp:163:7 [...] SUMMARY: AddressSanitizer: heap-buffer-overflow clang/lib/Serialization/ASTReader.cpp:10171:15 in clang::ASTReader::ReadString(llvm::SmallVectorImpl<unsigned long> const&, unsigned int&) ``` The reason is this particular RUN line: ``` // RUN: env CC_PRINT_HEADERS_FORMAT=json CC_PRINT_HEADERS_FILTERING=direct-per-file CC_PRINT_HEADERS_FILE=%t.txt %clang -fsyntax-only -I %S/Inputs/print-header-json -isystem %S/Inputs/print-header-json/system -fmodules -fimplicit-module-maps -fmodules-cache-path=%t %s -o /dev/null ``` which was added in 8df194f ("[Clang] Support includes translated to module imports in -header-include-filtering=direct-per-file (#156756)"). The problem is caused by an incremental build reusing stale cached module files (.pcm) that are no longer binary-compatible with the updated compiler. Adding a new sanitizer option altered the implicit binary layout of the serialized LangOptions data structure. The build + test system is oblivious to such changes. When the new compiler attempted to read the old module file (from the previous test invocation), it misinterpreted the data due to the layout mismatch, resulting in a heap-buffer-overflow. Unfortunately Clang's PCM format does not encode nor detect version mismatches here; a more graceful failure mode would be preferable. For now, fix the test to be more robust with incremental build + test.
A recent change adding a new sanitizer kind (via Sanitizers.def) was reverted in c74fa20 ("Revert "[Clang][CodeGen] Introduce the AllocToken SanitizerKind" (llvm#162413)"). The reason was this ASan report, when running the test cases in clang/test/Preprocessor/print-header-json.c: ``` ==clang==483265==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7d82b97e8b58 at pc 0x562cd432231f bp 0x7fff3fad0850 sp 0x7fff3fad0848 READ of size 16 at 0x7d82b97e8b58 thread T0 #0 0x562cd432231e in __copy_non_overlapping_range<const unsigned long *, const unsigned long *> zorg-test/libcxx_install_asan_ubsan/include/c++/v1/string:2144:38 llvm#1 0x562cd432231e in void std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>::__init_with_size[abi:nn220000]<unsigned long const*, unsigned long const*>(unsigned long const*, unsigned long const*, unsigned long) zorg-test/libcxx_install_asan_ubsan/include/c++/v1/string:2685:18 llvm#2 0x562cd41e2797 in __init<const unsigned long *, 0> zorg-test/libcxx_install_asan_ubsan/include/c++/v1/string:2673:3 llvm#3 0x562cd41e2797 in basic_string<const unsigned long *, 0> zorg-test/libcxx_install_asan_ubsan/include/c++/v1/string:1174:5 llvm#4 0x562cd41e2797 in clang::ASTReader::ReadString(llvm::SmallVectorImpl<unsigned long> const&, unsigned int&) clang/lib/Serialization/ASTReader.cpp:10171:15 llvm#5 0x562cd41fd89a in clang::ASTReader::ParseLanguageOptions(llvm::SmallVector<unsigned long, 64u> const&, llvm::StringRef, bool, clang::ASTReaderListener&, bool) clang/lib/Serialization/ASTReader.cpp:6475:28 llvm#6 0x562cd41eea53 in clang::ASTReader::ReadOptionsBlock(llvm::BitstreamCursor&, llvm::StringRef, unsigned int, bool, clang::ASTReaderListener&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&) clang/lib/Serialization/ASTReader.cpp:3069:11 llvm#7 0x562cd4204ab8 in clang::ASTReader::ReadControlBlock(clang::serialization::ModuleFile&, llvm::SmallVectorImpl<clang::ASTReader::ImportedModule>&, clang::serialization::ModuleFile const*, unsigned int) clang/lib/Serialization/ASTReader.cpp:3249:15 llvm#8 0x562cd42097d2 in clang::ASTReader::ReadASTCore(llvm::StringRef, clang::serialization::ModuleKind, clang::SourceLocation, clang::serialization::ModuleFile*, llvm::SmallVectorImpl<clang::ASTReader::ImportedModule>&, long, long, clang::ASTFileSignature, unsigned int) clang/lib/Serialization/ASTReader.cpp:5182:15 llvm#9 0x562cd421ec77 in clang::ASTReader::ReadAST(llvm::StringRef, clang::serialization::ModuleKind, clang::SourceLocation, unsigned int, clang::serialization::ModuleFile**) clang/lib/Serialization/ASTReader.cpp:4828:11 llvm#10 0x562cd3d07b74 in clang::CompilerInstance::findOrCompileModuleAndReadAST(llvm::StringRef, clang::SourceLocation, clang::SourceLocation, bool) clang/lib/Frontend/CompilerInstance.cpp:1805:27 llvm#11 0x562cd3d0b2ef in clang::CompilerInstance::loadModule(clang::SourceLocation, llvm::ArrayRef<clang::IdentifierLoc>, clang::Module::NameVisibilityKind, bool) clang/lib/Frontend/CompilerInstance.cpp:1956:31 llvm#12 0x562cdb04eb1c in clang::Preprocessor::HandleHeaderIncludeOrImport(clang::SourceLocation, clang::Token&, clang::Token&, clang::SourceLocation, clang::detail::SearchDirIteratorImpl<true>, clang::FileEntry const*) clang/lib/Lex/PPDirectives.cpp:2423:49 llvm#13 0x562cdb042222 in clang::Preprocessor::HandleIncludeDirective(clang::SourceLocation, clang::Token&, clang::detail::SearchDirIteratorImpl<true>, clang::FileEntry const*) clang/lib/Lex/PPDirectives.cpp:2101:17 llvm#14 0x562cdb043366 in clang::Preprocessor::HandleDirective(clang::Token&) clang/lib/Lex/PPDirectives.cpp:1338:14 llvm#15 0x562cdafa84bc in clang::Lexer::LexTokenInternal(clang::Token&, bool) clang/lib/Lex/Lexer.cpp:4512:7 llvm#16 0x562cdaf9f20b in clang::Lexer::Lex(clang::Token&) clang/lib/Lex/Lexer.cpp:3729:24 llvm#17 0x562cdb0d4ffa in clang::Preprocessor::Lex(clang::Token&) clang/lib/Lex/Preprocessor.cpp:896:11 llvm#18 0x562cd77da950 in clang::ParseAST(clang::Sema&, bool, bool) clang/lib/Parse/ParseAST.cpp:163:7 [...] 0x7d82b97e8b58 is located 0 bytes after 3288-byte region [0x7d82b97e7e80,0x7d82b97e8b58) allocated by thread T0 here: #0 0x562cca76f604 in malloc zorg-test/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:67:3 llvm#1 0x562cd1cce452 in safe_malloc llvm/include/llvm/Support/MemAlloc.h:26:18 llvm#2 0x562cd1cce452 in llvm::SmallVectorBase<unsigned int>::grow_pod(void*, unsigned long, unsigned long) llvm/lib/Support/SmallVector.cpp:151:15 llvm#3 0x562cdbe1768b in grow_pod llvm/include/llvm/ADT/SmallVector.h:139:11 llvm#4 0x562cdbe1768b in grow llvm/include/llvm/ADT/SmallVector.h:525:41 llvm#5 0x562cdbe1768b in reserve llvm/include/llvm/ADT/SmallVector.h:665:13 llvm#6 0x562cdbe1768b in llvm::BitstreamCursor::readRecord(unsigned int, llvm::SmallVectorImpl<unsigned long>&, llvm::StringRef*) llvm/lib/Bitstream/Reader/BitstreamReader.cpp:230:10 llvm#7 0x562cd41ee8ab in clang::ASTReader::ReadOptionsBlock(llvm::BitstreamCursor&, llvm::StringRef, unsigned int, bool, clang::ASTReaderListener&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&) clang/lib/Serialization/ASTReader.cpp:3060:49 llvm#8 0x562cd4204ab8 in clang::ASTReader::ReadControlBlock(clang::serialization::ModuleFile&, llvm::SmallVectorImpl<clang::ASTReader::ImportedModule>&, clang::serialization::ModuleFile const*, unsigned int) clang/lib/Serialization/ASTReader.cpp:3249:15 llvm#9 0x562cd42097d2 in clang::ASTReader::ReadASTCore(llvm::StringRef, clang::serialization::ModuleKind, clang::SourceLocation, clang::serialization::ModuleFile*, llvm::SmallVectorImpl<clang::ASTReader::ImportedModule>&, long, long, clang::ASTFileSignature, unsigned int) clang/lib/Serialization/ASTReader.cpp:5182:15 llvm#10 0x562cd421ec77 in clang::ASTReader::ReadAST(llvm::StringRef, clang::serialization::ModuleKind, clang::SourceLocation, unsigned int, clang::serialization::ModuleFile**) clang/lib/Serialization/ASTReader.cpp:4828:11 llvm#11 0x562cd3d07b74 in clang::CompilerInstance::findOrCompileModuleAndReadAST(llvm::StringRef, clang::SourceLocation, clang::SourceLocation, bool) clang/lib/Frontend/CompilerInstance.cpp:1805:27 llvm#12 0x562cd3d0b2ef in clang::CompilerInstance::loadModule(clang::SourceLocation, llvm::ArrayRef<clang::IdentifierLoc>, clang::Module::NameVisibilityKind, bool) clang/lib/Frontend/CompilerInstance.cpp:1956:31 llvm#13 0x562cdb04eb1c in clang::Preprocessor::HandleHeaderIncludeOrImport(clang::SourceLocation, clang::Token&, clang::Token&, clang::SourceLocation, clang::detail::SearchDirIteratorImpl<true>, clang::FileEntry const*) clang/lib/Lex/PPDirectives.cpp:2423:49 llvm#14 0x562cdb042222 in clang::Preprocessor::HandleIncludeDirective(clang::SourceLocation, clang::Token&, clang::detail::SearchDirIteratorImpl<true>, clang::FileEntry const*) clang/lib/Lex/PPDirectives.cpp:2101:17 llvm#15 0x562cdb043366 in clang::Preprocessor::HandleDirective(clang::Token&) clang/lib/Lex/PPDirectives.cpp:1338:14 llvm#16 0x562cdafa84bc in clang::Lexer::LexTokenInternal(clang::Token&, bool) clang/lib/Lex/Lexer.cpp:4512:7 llvm#17 0x562cdaf9f20b in clang::Lexer::Lex(clang::Token&) clang/lib/Lex/Lexer.cpp:3729:24 llvm#18 0x562cdb0d4ffa in clang::Preprocessor::Lex(clang::Token&) clang/lib/Lex/Preprocessor.cpp:896:11 llvm#19 0x562cd77da950 in clang::ParseAST(clang::Sema&, bool, bool) clang/lib/Parse/ParseAST.cpp:163:7 [...] SUMMARY: AddressSanitizer: heap-buffer-overflow clang/lib/Serialization/ASTReader.cpp:10171:15 in clang::ASTReader::ReadString(llvm::SmallVectorImpl<unsigned long> const&, unsigned int&) ``` The reason is this particular RUN line: ``` // RUN: env CC_PRINT_HEADERS_FORMAT=json CC_PRINT_HEADERS_FILTERING=direct-per-file CC_PRINT_HEADERS_FILE=%t.txt %clang -fsyntax-only -I %S/Inputs/print-header-json -isystem %S/Inputs/print-header-json/system -fmodules -fimplicit-module-maps -fmodules-cache-path=%t %s -o /dev/null ``` which was added in 8df194f ("[Clang] Support includes translated to module imports in -header-include-filtering=direct-per-file (llvm#156756)"). The problem is caused by an incremental build reusing stale cached module files (.pcm) that are no longer binary-compatible with the updated compiler. Adding a new sanitizer option altered the implicit binary layout of the serialized LangOptions data structure. The build + test system is oblivious to such changes. When the new compiler attempted to read the old module file (from the previous test invocation), it misinterpreted the data due to the layout mismatch, resulting in a heap-buffer-overflow. Unfortunately Clang's PCM format does not encode nor detect version mismatches here; a more graceful failure mode would be preferable. For now, fix the test to be more robust with incremental build + test.
A recent change adding a new sanitizer kind (via Sanitizers.def) was reverted in c74fa20 ("Revert "[Clang][CodeGen] Introduce the AllocToken SanitizerKind" (llvm#162413)"). The reason was this ASan report, when running the test cases in clang/test/Preprocessor/print-header-json.c: ``` ==clang==483265==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7d82b97e8b58 at pc 0x562cd432231f bp 0x7fff3fad0850 sp 0x7fff3fad0848 READ of size 16 at 0x7d82b97e8b58 thread T0 #0 0x562cd432231e in __copy_non_overlapping_range<const unsigned long *, const unsigned long *> zorg-test/libcxx_install_asan_ubsan/include/c++/v1/string:2144:38 llvm#1 0x562cd432231e in void std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>::__init_with_size[abi:nn220000]<unsigned long const*, unsigned long const*>(unsigned long const*, unsigned long const*, unsigned long) zorg-test/libcxx_install_asan_ubsan/include/c++/v1/string:2685:18 llvm#2 0x562cd41e2797 in __init<const unsigned long *, 0> zorg-test/libcxx_install_asan_ubsan/include/c++/v1/string:2673:3 llvm#3 0x562cd41e2797 in basic_string<const unsigned long *, 0> zorg-test/libcxx_install_asan_ubsan/include/c++/v1/string:1174:5 llvm#4 0x562cd41e2797 in clang::ASTReader::ReadString(llvm::SmallVectorImpl<unsigned long> const&, unsigned int&) clang/lib/Serialization/ASTReader.cpp:10171:15 llvm#5 0x562cd41fd89a in clang::ASTReader::ParseLanguageOptions(llvm::SmallVector<unsigned long, 64u> const&, llvm::StringRef, bool, clang::ASTReaderListener&, bool) clang/lib/Serialization/ASTReader.cpp:6475:28 llvm#6 0x562cd41eea53 in clang::ASTReader::ReadOptionsBlock(llvm::BitstreamCursor&, llvm::StringRef, unsigned int, bool, clang::ASTReaderListener&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&) clang/lib/Serialization/ASTReader.cpp:3069:11 llvm#7 0x562cd4204ab8 in clang::ASTReader::ReadControlBlock(clang::serialization::ModuleFile&, llvm::SmallVectorImpl<clang::ASTReader::ImportedModule>&, clang::serialization::ModuleFile const*, unsigned int) clang/lib/Serialization/ASTReader.cpp:3249:15 llvm#8 0x562cd42097d2 in clang::ASTReader::ReadASTCore(llvm::StringRef, clang::serialization::ModuleKind, clang::SourceLocation, clang::serialization::ModuleFile*, llvm::SmallVectorImpl<clang::ASTReader::ImportedModule>&, long, long, clang::ASTFileSignature, unsigned int) clang/lib/Serialization/ASTReader.cpp:5182:15 llvm#9 0x562cd421ec77 in clang::ASTReader::ReadAST(llvm::StringRef, clang::serialization::ModuleKind, clang::SourceLocation, unsigned int, clang::serialization::ModuleFile**) clang/lib/Serialization/ASTReader.cpp:4828:11 llvm#10 0x562cd3d07b74 in clang::CompilerInstance::findOrCompileModuleAndReadAST(llvm::StringRef, clang::SourceLocation, clang::SourceLocation, bool) clang/lib/Frontend/CompilerInstance.cpp:1805:27 llvm#11 0x562cd3d0b2ef in clang::CompilerInstance::loadModule(clang::SourceLocation, llvm::ArrayRef<clang::IdentifierLoc>, clang::Module::NameVisibilityKind, bool) clang/lib/Frontend/CompilerInstance.cpp:1956:31 llvm#12 0x562cdb04eb1c in clang::Preprocessor::HandleHeaderIncludeOrImport(clang::SourceLocation, clang::Token&, clang::Token&, clang::SourceLocation, clang::detail::SearchDirIteratorImpl<true>, clang::FileEntry const*) clang/lib/Lex/PPDirectives.cpp:2423:49 llvm#13 0x562cdb042222 in clang::Preprocessor::HandleIncludeDirective(clang::SourceLocation, clang::Token&, clang::detail::SearchDirIteratorImpl<true>, clang::FileEntry const*) clang/lib/Lex/PPDirectives.cpp:2101:17 llvm#14 0x562cdb043366 in clang::Preprocessor::HandleDirective(clang::Token&) clang/lib/Lex/PPDirectives.cpp:1338:14 llvm#15 0x562cdafa84bc in clang::Lexer::LexTokenInternal(clang::Token&, bool) clang/lib/Lex/Lexer.cpp:4512:7 llvm#16 0x562cdaf9f20b in clang::Lexer::Lex(clang::Token&) clang/lib/Lex/Lexer.cpp:3729:24 llvm#17 0x562cdb0d4ffa in clang::Preprocessor::Lex(clang::Token&) clang/lib/Lex/Preprocessor.cpp:896:11 llvm#18 0x562cd77da950 in clang::ParseAST(clang::Sema&, bool, bool) clang/lib/Parse/ParseAST.cpp:163:7 [...] 0x7d82b97e8b58 is located 0 bytes after 3288-byte region [0x7d82b97e7e80,0x7d82b97e8b58) allocated by thread T0 here: #0 0x562cca76f604 in malloc zorg-test/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:67:3 llvm#1 0x562cd1cce452 in safe_malloc llvm/include/llvm/Support/MemAlloc.h:26:18 llvm#2 0x562cd1cce452 in llvm::SmallVectorBase<unsigned int>::grow_pod(void*, unsigned long, unsigned long) llvm/lib/Support/SmallVector.cpp:151:15 llvm#3 0x562cdbe1768b in grow_pod llvm/include/llvm/ADT/SmallVector.h:139:11 llvm#4 0x562cdbe1768b in grow llvm/include/llvm/ADT/SmallVector.h:525:41 llvm#5 0x562cdbe1768b in reserve llvm/include/llvm/ADT/SmallVector.h:665:13 llvm#6 0x562cdbe1768b in llvm::BitstreamCursor::readRecord(unsigned int, llvm::SmallVectorImpl<unsigned long>&, llvm::StringRef*) llvm/lib/Bitstream/Reader/BitstreamReader.cpp:230:10 llvm#7 0x562cd41ee8ab in clang::ASTReader::ReadOptionsBlock(llvm::BitstreamCursor&, llvm::StringRef, unsigned int, bool, clang::ASTReaderListener&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&) clang/lib/Serialization/ASTReader.cpp:3060:49 llvm#8 0x562cd4204ab8 in clang::ASTReader::ReadControlBlock(clang::serialization::ModuleFile&, llvm::SmallVectorImpl<clang::ASTReader::ImportedModule>&, clang::serialization::ModuleFile const*, unsigned int) clang/lib/Serialization/ASTReader.cpp:3249:15 llvm#9 0x562cd42097d2 in clang::ASTReader::ReadASTCore(llvm::StringRef, clang::serialization::ModuleKind, clang::SourceLocation, clang::serialization::ModuleFile*, llvm::SmallVectorImpl<clang::ASTReader::ImportedModule>&, long, long, clang::ASTFileSignature, unsigned int) clang/lib/Serialization/ASTReader.cpp:5182:15 llvm#10 0x562cd421ec77 in clang::ASTReader::ReadAST(llvm::StringRef, clang::serialization::ModuleKind, clang::SourceLocation, unsigned int, clang::serialization::ModuleFile**) clang/lib/Serialization/ASTReader.cpp:4828:11 llvm#11 0x562cd3d07b74 in clang::CompilerInstance::findOrCompileModuleAndReadAST(llvm::StringRef, clang::SourceLocation, clang::SourceLocation, bool) clang/lib/Frontend/CompilerInstance.cpp:1805:27 llvm#12 0x562cd3d0b2ef in clang::CompilerInstance::loadModule(clang::SourceLocation, llvm::ArrayRef<clang::IdentifierLoc>, clang::Module::NameVisibilityKind, bool) clang/lib/Frontend/CompilerInstance.cpp:1956:31 llvm#13 0x562cdb04eb1c in clang::Preprocessor::HandleHeaderIncludeOrImport(clang::SourceLocation, clang::Token&, clang::Token&, clang::SourceLocation, clang::detail::SearchDirIteratorImpl<true>, clang::FileEntry const*) clang/lib/Lex/PPDirectives.cpp:2423:49 llvm#14 0x562cdb042222 in clang::Preprocessor::HandleIncludeDirective(clang::SourceLocation, clang::Token&, clang::detail::SearchDirIteratorImpl<true>, clang::FileEntry const*) clang/lib/Lex/PPDirectives.cpp:2101:17 llvm#15 0x562cdb043366 in clang::Preprocessor::HandleDirective(clang::Token&) clang/lib/Lex/PPDirectives.cpp:1338:14 llvm#16 0x562cdafa84bc in clang::Lexer::LexTokenInternal(clang::Token&, bool) clang/lib/Lex/Lexer.cpp:4512:7 llvm#17 0x562cdaf9f20b in clang::Lexer::Lex(clang::Token&) clang/lib/Lex/Lexer.cpp:3729:24 llvm#18 0x562cdb0d4ffa in clang::Preprocessor::Lex(clang::Token&) clang/lib/Lex/Preprocessor.cpp:896:11 llvm#19 0x562cd77da950 in clang::ParseAST(clang::Sema&, bool, bool) clang/lib/Parse/ParseAST.cpp:163:7 [...] SUMMARY: AddressSanitizer: heap-buffer-overflow clang/lib/Serialization/ASTReader.cpp:10171:15 in clang::ASTReader::ReadString(llvm::SmallVectorImpl<unsigned long> const&, unsigned int&) ``` The reason is this particular RUN line: ``` // RUN: env CC_PRINT_HEADERS_FORMAT=json CC_PRINT_HEADERS_FILTERING=direct-per-file CC_PRINT_HEADERS_FILE=%t.txt %clang -fsyntax-only -I %S/Inputs/print-header-json -isystem %S/Inputs/print-header-json/system -fmodules -fimplicit-module-maps -fmodules-cache-path=%t %s -o /dev/null ``` which was added in 8df194f ("[Clang] Support includes translated to module imports in -header-include-filtering=direct-per-file (llvm#156756)"). The problem is caused by an incremental build reusing stale cached module files (.pcm) that are no longer binary-compatible with the updated compiler. Adding a new sanitizer option altered the implicit binary layout of the serialized LangOptions data structure. The build + test system is oblivious to such changes. When the new compiler attempted to read the old module file (from the previous test invocation), it misinterpreted the data due to the layout mismatch, resulting in a heap-buffer-overflow. Unfortunately Clang's PCM format does not encode nor detect version mismatches here; a more graceful failure mode would be preferable. For now, fix the test to be more robust with incremental build + test.
This PR is a follow-up to #137087 that extends the output format generated by -header-include-filtering=direct-per-file to include information about the source location where those include/imports happended, as well as include information about the imported module when an include is translated to a module import.